Exploration Methods for Connectionist Q-learning in Bomberman
نویسندگان
چکیده
In this paper, we investigate which exploration method yields the best performance in the game Bomberman. In Bomberman the controlled agent has to kill opponents by placing bombs. The agent is represented by a multi-layer perceptron that learns to play the game with the use of Q-learning. We introduce two novel exploration strategies: Error-Driven-ε and Interval-Q, which base their explorative behavior on the temporaldifference error of Q-learning. The learning capabilities of these exploration strategies are compared to five existing methods: Random-Walk, Greedy, ε-Greedy, Diminishing ε-Greedy, and Max-Boltzmann. The results show that the methods that combine exploration with exploitation perform much better than the Random-Walk and Greedy strategies, which only select exploration or exploitation actions. Furthermore, the results show that Max-Boltzmann exploration performs the best in overall from the different techniques. The Error-Driven-ε exploration strategy also performs very well, but suffers from an unstable learning behavior.
منابع مشابه
Eecient Exploration in Reinforcement Learning
Exploration plays a fundamental role in any active learning system. This study evaluates the role of exploration in active learning and describes several local techniques for exploration in nite, discrete domains, embedded in a reinforcement learning framework (delayed reinforcement). This paper distinguishes between two families of exploration schemes: undirected and directed exploration. Whil...
متن کاملcient Exploration In Reinforcement Learning Sebastian
Exploration plays a fundamental role in any active learning system. This study evaluates the role of exploration in active learning and describes several local techniques for exploration in nite, discrete domains, embedded in a reinforcement learning framework (delayed reinforcement). This paper distinguishes between two families of exploration schemes: undirected and directed exploration. Whil...
متن کاملMemory-guided Exploration in Reinforcement Learning
The life-long learning architecture attempts to create an adaptive agent through the incorporation of prior knowledge over the lifetime of a learning agent. Our paper focuses on task transfer in reinforcement learning and specifically in Q-learning. There are three main model free methods for performing task transfer in Qlearning: direct transfer, soft transfer and memoryguided exploration. In ...
متن کاملConnectionist Q-learning in Robot Control Task
The Q-Learning algorithm suggested by Watkins in 1989 [1] belongs to a group of reinforcement learning algorithms. Reinforcement learning in robot control tasks has the form of multi-step procedure of adaptation. The main feature of that technique is that in the process of learning the system is not shown how to act in a specific situation. Instead, learning develops by trial and error using re...
متن کامل